Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
J Comput Biol ; 31(4): 328-344, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38271573

RESUMO

Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms that drive the onset and progression of cancer. Modeling tumor cell evolution with labeled trees motivates researchers to develop different measures to compare labeled trees. Although the Robinson-Foulds (RF) distance is widely used for comparing species trees, its applicability to labeled trees reveals certain limitations. This study introduces the k-RF dissimilarity measures, tailored to address the challenges of labeled tree comparison. The RF distance is succinctly expressed as n-RF in the space of labeled trees with n nodes. Like the RF distance, the k-RF is a pseudometric for multiset-labeled trees and becomes a metric in the space of 1-labeled trees. By setting k to a small value, the k-RF dissimilarity can capture analogous local regions in two labeled trees with different size or different labels.


Assuntos
Algoritmos , Humanos , Neoplasias/genética , Mutação , Biologia Computacional/métodos , Filogenia
2.
Front Microbiol ; 13: 827742, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35910656

RESUMO

Knowledge of virus-host interactomes has advanced exponentially in the last decade by the use of high-throughput screening technologies to obtain a more comprehensive landscape of virus-host protein-protein interactions. In this article, we present a systematic review of the available virus-host protein-protein interaction database resources. The resources covered in this review are both generic virus-host protein-protein interaction databases and databases of protein-protein interactions for a specific virus or for those viruses that infect a particular host. The databases are reviewed on the basis of the specificity for a particular virus or host, the number of virus-host protein-protein interactions included, and the functionality in terms of browse, search, visualization, and download. Further, we also analyze the overlap of the databases, that is, the number of virus-host protein-protein interactions shared by the various databases, as well as the structure of the virus-host protein-protein interaction network, across viruses and hosts.

3.
J Comput Biol ; 28(12): 1181-1195, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34714118

RESUMO

The Robinson-Foulds (RF) distance, one of the most widely used metrics for comparing phylogenetic trees, has the advantage of being intuitive, with a natural interpretation in terms of common splits, and it can be computed in linear time, but it has a very low resolution, and it may become trivial for phylogenetic trees with overlapping taxa, that is, phylogenetic trees that share some but not all of their leaf labels. In this article, we study the properties of the Generalized Robinson-Foulds (GRF) distance, a recently proposed metric for comparing any structures that can be described by multisets of multisets of labels, when applied to rooted phylogenetic trees with overlapping taxa, which are described by sets of clusters, that is, by sets of sets of labels. We show that the GRF distance has a very high resolution, it can also be computed in linear time, and it is not (uniformly) equivalent to the RF distance.


Assuntos
Classificação/métodos , Biologia Computacional/métodos , Algoritmos , Modelos Genéticos , Filogenia
4.
PLoS One ; 15(12): e0236304, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33284827

RESUMO

MOTIVATION: Beside socio-economic issues, coronavirus pandemic COVID-19, the infectious disease caused by the newly discovered coronavirus SARS-CoV-2, has caused a deep impact in the scientific community, that has considerably increased its effort to discover the infection strategies of the new virus. Among the extensive and crucial research that has been carried out in the last months, the analysis of the virus-host relationship plays an important role in drug discovery. Virus-host protein-protein interactions are the active agents in virus replication, and the analysis of virus-host protein-protein interaction networks is fundamental to the study of the virus-host relationship. RESULTS: We have adapted and implemented a recent integer linear programming model for protein-protein interaction network alignment to virus-host networks, and obtained a consensus alignment of the SARS-CoV-1 and SARS-CoV-2 virus-host protein-protein interaction networks. Despite the lack of shared human proteins in these virus-host networks, and the low number of preserved virus-host interactions, the consensus alignment revealed aligned human proteins that share a function related to viral infection, as well as human proteins of high functional similarity that interact with SARS-CoV-1 and SARS-CoV-2 proteins, whose alignment would preserve these virus-host interactions.


Assuntos
Interações entre Hospedeiro e Microrganismos/fisiologia , Mapas de Interação de Proteínas/fisiologia , SARS-CoV-2/metabolismo , COVID-19/virologia , Coronavirus/metabolismo , Infecções por Coronavirus/virologia , Humanos , Modelos Teóricos , Pandemias , Pneumonia Viral/virologia , Programação Linear , Ligação Proteica/fisiologia , Proteínas/metabolismo , Glicoproteína da Espícula de Coronavírus/metabolismo , Replicação Viral/fisiologia
5.
BMC Bioinformatics ; 21(Suppl 6): 434, 2020 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-33203352

RESUMO

BACKGROUND: The alignment of protein-protein interaction networks was recently formulated as an integer quadratic programming problem, along with a linearization that can be solved by integer linear programming software tools. However, the resulting integer linear program has a huge number of variables and constraints, rendering it of no practical use. RESULTS: We present a compact integer linear programming reformulation of the protein-protein interaction network alignment problem, which can be solved using state-of-the-art mathematical modeling and integer linear programming software tools, along with empirical results showing that small biological networks, such as virus-host protein-protein interaction networks, can be aligned in a reasonable amount of time on a personal computer and the resulting alignments are structurally coherent and biologically meaningful. CONCLUSIONS: The implementation of the integer linear programming reformulation using current mathematical modeling and integer linear programming software tools provided biologically meaningful alignments of virus-host protein-protein interaction networks.


Assuntos
Programação Linear , Mapas de Interação de Proteínas , Software , Algoritmos , Modelos Teóricos
6.
BMC Bioinformatics ; 21(Suppl 6): 265, 2020 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-33203353

RESUMO

BACKGROUND: All molecular functions and biological processes are carried out by groups of proteins that interact with each other. Metaproteomic data continuously generates new proteins whose molecular functions and relations must be discovered. A widely accepted structure to model functional relations between proteins are protein-protein interaction networks (PPIN), and their analysis and alignment has become a key ingredient in the study and prediction of protein-protein interactions, protein function, and evolutionary conserved assembly pathways of protein complexes. Several PPIN aligners have been proposed, but attaining the right balance between network topology and biological information is one of the most difficult and key points in the design of any PPIN alignment algorithm. RESULTS: Motivated by the challenge of well-balanced and efficient algorithms, we have designed and implemented AligNet, a parameter-free pairwise PPIN alignment algorithm aimed at bridging the gap between topologically efficient and biologically meaningful matchings. A comparison of the results obtained with AligNet and with the best aligners shows that AligNet achieves indeed a good balance between topological and biological matching. CONCLUSION: In this paper we present AligNet, a new pairwise global PPIN aligner that produces biologically meaningful alignments, by achieving a good balance between structural matching and protein function conservation, and more efficient computations than state-of-the-art tools.


Assuntos
Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas , Algoritmos , Evolução Biológica , Proteínas/metabolismo
7.
J Math Biol ; 79(3): 1105-1148, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31209515

RESUMO

We define a new balance index for rooted phylogenetic trees based on the symmetry of the evolutive history of every set of 4 leaves. This index makes sense for multifurcating trees and it can be computed in time linear in the number of leaves. We determine its maximum and minimum values for arbitrary and bifurcating trees, and we provide exact formulas for its expected value and variance on bifurcating trees under Ford's [Formula: see text]-model and Aldous' [Formula: see text]-model and on arbitrary trees under the [Formula: see text]-[Formula: see text]-model.


Assuntos
Algoritmos , Evolução Biológica , Conceitos Matemáticos , Modelos Biológicos , Filogenia , Animais , Humanos
8.
J Comput Biol ; 25(3): 348-360, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29028181

RESUMO

The classification of reads from a metagenomic sample using a reference taxonomy is usually based on first mapping the reads to the reference sequences and then classifying each read at a node under the lowest common ancestor of the candidate sequences in the reference taxonomy with the least classification error. However, this taxonomic annotation can be biased by an imbalanced taxonomy and also by the presence of multiple nodes in the taxonomy with the least classification error for a given read. In this article, we show that the Rand index is a better indicator of classification error than the often used area under the receiver operating characteristic (ROC) curve and F-measure for both balanced and imbalanced reference taxonomies, and we also address the second source of bias by reducing the taxonomic annotation problem for a whole metagenomic sample to a set cover problem, for which a logarithmic approximation can be obtained in linear time and an exact solution can be obtained by integer linear programming. Experimental results with a proof-of-concept implementation of the set cover approach to taxonomic annotation in a next release of the TANGO software show that the set cover approach further reduces ambiguity in the taxonomic annotation obtained with TANGO without distorting the relative abundance profile of the metagenomic sample.


Assuntos
Código de Barras de DNA Taxonômico/métodos , Metagenoma , Filogenia , Software , Código de Barras de DNA Taxonômico/normas , Humanos , Microbiota
9.
PLoS One ; 11(6): e0157383, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27299312

RESUMO

Currently, there is very little information available regarding the microbiome associated with the wine production chain. Here, we used an amplicon sequencing approach based on high-throughput sequencing (HTS) to obtain a comprehensive assessment of the bacterial community associated with the production of three Apulian red wines, from grape to final product. The relationships among grape variety, the microbial community, and fermentation was investigated. Moreover, the winery microbiota was evaluated compared to the autochthonous species in vineyards that persist until the end of the winemaking process. The analysis highlighted the remarkable dynamics within the microbial communities during fermentation. A common microbial core shared among the examined wine varieties was observed, and the unique taxonomic signature of each wine appellation was revealed. New species belonging to the genus Halomonas were also reported. This study demonstrates the potential of this metagenomic approach, supported by optimized protocols, for identifying the biodiversity of the wine supply chain. The developed experimental pipeline offers new prospects for other research fields in which a comprehensive view of microbial community complexity and dynamics is desirable.


Assuntos
Bactérias/genética , Fungos/genética , Vitis/microbiologia , Vinho/microbiologia , Bactérias/classificação , Bactérias/isolamento & purificação , Fermentação , Frutas/microbiologia , Fungos/classificação , Fungos/isolamento & purificação , Ensaios de Triagem em Larga Escala , Metagenômica , Microbiota
10.
BMC Bioinformatics ; 16: 203, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-26130132

RESUMO

BACKGROUND: Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects. RESULTS: BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data). CONCLUSION: BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Fungos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica , Software , Biodiversidade
11.
ScientificWorldJournal ; 2014: 254279, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24982934

RESUMO

Several polynomial time computable metrics on the class of semibinary tree-sibling time consistent phylogenetic networks are available in the literature; in particular, the problem of deciding if two networks of this kind are isomorphic is in P. In this paper, we show that if we remove the semibinarity condition, then the problem becomes much harder. More precisely, we prove that the isomorphism problem for generic tree-sibling time consistent phylogenetic networks is polynomially equivalent to the graph isomorphism problem. Since the latter is believed not to belong to P, the chances are that it is impossible to define a metric on the class of all tree-sibling time consistent phylogenetic networks that can be computed in polynomial time.


Assuntos
Algoritmos , Filogenia , Biologia Computacional , Humanos
12.
Bioinformatics ; 30(1): 17-23, 2014 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-23645816

RESUMO

MOTIVATION: TANGO is one of the most accurate tools for the taxonomic assignment of sequence reads. However, because of the differences in the taxonomy structures, performing a taxonomic assignment on different reference taxonomies will produce divergent results. RESULTS: We have improved the TANGO pipeline to be able to perform the taxonomic assignment of a metagenomic sample using alternative reference taxonomies, coming from different sources. We highlight the novel pre-processing step, necessary to accomplish this task, and describe the improvements in the assignment process. We present the new TANGO pipeline in details, and, finally, we show its performance on four real metagenomic datasets and also on synthetic datasets. AVAILABILITY: The new version of TANGO, including implementation improvements and novel developments to perform the assignment on different reference taxonomies, is freely available at http://sourceforge.net/projects/taxoassignment/.


Assuntos
Metagenômica/métodos , Software , Algoritmos , Metagenômica/classificação
14.
Brief Bioinform ; 13(6): 682-95, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22786784

RESUMO

Metagenomics is providing an unprecedented access to the environmental microbial diversity. The amplicon-based metagenomics approach involves the PCR-targeted sequencing of a genetic locus fitting different features. Namely, it must be ubiquitous in the taxonomic range of interest, variable enough to discriminate between different species but flanked by highly conserved sequences, and of suitable size to be sequenced through next-generation platforms. The internal transcribed spacers 1 and 2 (ITS1 and ITS2) of the ribosomal DNA operon and one or more hyper-variable regions of 16S ribosomal RNA gene are typically used to identify fungal and bacterial species, respectively. In this context, reliable reference databases and taxonomies are crucial to assign amplicon sequence reads to the correct phylogenetic ranks. Several resources provide consistent phylogenetic classification of publicly available 16S ribosomal DNA sequences, whereas the state of ribosomal internal transcribed spacers reference databases is notably less advanced. In this review, we aim to give an overview of existing reference resources for both types of markers, highlighting strengths and possible shortcomings of their use for metagenomics purposes. Moreover, we present a new database, ITSoneDB, of well annotated and phylogenetically classified ITS1 sequences to be used as a reference collection in metagenomic studies of environmental fungal communities. ITSoneDB is available for download and browsing at http://itsonedb.ba.itb.cnr.it/.


Assuntos
Bases de Dados Genéticas , Metagenômica/métodos , Algoritmos , Fungos/classificação , Fungos/genética , Genes de RNAr , RNA Ribossômico 16S/genética , RNA Ribossômico 16S/metabolismo
15.
Brief Bioinform ; 12(6): 614-25, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21504986

RESUMO

Next-generation sequencing technologies have opened up an unprecedented opportunity for microbiology by enabling the culture-independent genetic study of complex microbial communities, which were so far largely unknown. The analysis of metagenomic data is challenging: potentially, one is faced with a sample containing a mixture of many different bacterial species, whose genome has not necessarily been sequenced beforehand. In the simpler case of the analysis of 16S ribosomal RNA metagenomic data, for which databases of reference sequences are known, we survey the computational challenges to be solved in order to be able to characterize and quantify a sample. In particular, we examine two aspects: how the necessary adoption of new tools geared towards high-throughput analysis impacts the quality of the results, and how good is the performance of various established methods to assign sequence reads to microbial species, with and without taking taxonomic information into account.


Assuntos
Metagenômica/métodos , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , DNA Bacteriano/química , Metagenoma , RNA Ribossômico 16S/química
16.
BMC Bioinformatics ; 12: 8, 2011 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-21211059

RESUMO

BACKGROUND: To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads) are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it. RESULTS: We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed. CONCLUSIONS: The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results.


Assuntos
Bactérias/classificação , Biologia Computacional/métodos , Metagenômica , Análise de Sequência de DNA/métodos , Algoritmos , Bactérias/genética , DNA Bacteriano/genética
17.
Artigo em Inglês | MEDLINE | ID: mdl-20660951

RESUMO

Galled trees, directed acyclic graphs that model evolutionary histories with isolated hybridization events, have become very popular due to both their biological significance and the existence of polynomial-time algorithms for their reconstruction. In this paper, we establish to which extent several distance measures for the comparison of evolutionary networks are metrics for galled trees, and hence, when they can be safely used to evaluate galled tree reconstruction methods.


Assuntos
Filogenia , Biologia Computacional/métodos , Evolução Molecular , Perfilação da Expressão Gênica/métodos , Hibridização Genética , Modelos Genéticos
18.
BMC Bioinformatics ; 11: 268, 2010 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-20487540

RESUMO

BACKGROUND: Typical evolutionary events like recombination, hybridization or gene transfer make necessary the use of phylogenetic networks to properly depict the evolution of DNA and protein sequences. Although several theoretical classes have been proposed to characterize these networks, they make stringent assumptions that will likely not be met by the evolutionary process. We have recently shown that the complexity of simulated networks is a function of the population recombination rate, and that at moderate and large recombination rates the resulting networks cannot be categorized. However, we do not know whether these results extend to networks estimated from real data. RESULTS: We introduce a web server for the categorization of explicit phylogenetic networks, including the most relevant theoretical classes developed so far. Using this tool, we analyzed statistical parsimony phylogenetic networks estimated from approximately 5,000 DNA alignments, obtained from the NCBI PopSet and Polymorphix databases. The level of characterization was correlated to nucleotide diversity, and a high proportion of the networks derived from these data sets could be formally characterized. CONCLUSIONS: We have developed a public web server, NetTest (freely available from the software section at http://darwin.uvigo.es), to formally characterize the complexity of phylogenetic networks. Using NetTest we found that most statistical parsimony networks estimated with the program TCS could be assigned to a known network class. The level of network characterization was correlated to nucleotide diversity and dependent upon the intra/interspecific levels, although no significant differences were detected among genes. More research on the properties of phylogenetic networks is clearly needed.


Assuntos
Filogenia , Software , Bases de Dados Genéticas , Evolução Molecular , Hibridização Genética
19.
BMC Bioinformatics ; 11: 138, 2010 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-20236520

RESUMO

BACKGROUND: Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+. RESULTS: We have developed a TOPS+ string model as an improvement to the TOPS 123 graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset 4 demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method. CONCLUSIONS: Our advanced TOPS+ comparison shows better performance on the PDB40 dataset 4 compared to our basic TOPS+ method, giving 90% accuracy for SCOP alpha+beta; a 6% increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset 5, achieving 98% accuracy. SOFTWARE AVAILABILITY: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Software , Algoritmos , Bases de Dados de Proteínas , Ligantes , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína
20.
J Math Biol ; 61(2): 253-276, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19760227

RESUMO

Dissimilarity measures for (possibly weighted) phylogenetic trees based on the comparison of their vectors of path lengths between pairs of taxa, have been present in the systematics literature since the early seventies. For rooted phylogenetic trees, however, these vectors can only separate non-weighted binary trees, and therefore these dissimilarity measures are metrics only on this class of rooted phylogenetic trees. In this paper we overcome this problem, by splitting in a suitable way each path length between two taxa into two lengths. We prove that the resulting splitted path lengths matrices single out arbitrary rooted phylogenetic trees with nested taxa and arcs weighted in the set of positive real numbers. This allows the definition of metrics on this general class of rooted phylogenetic trees by comparing these matrices through metrics in spaces M(n)(R) of real-valued n x n matrices. We conclude this paper by establishing some basic facts about the metrics for non-weighted phylogenetic trees defined in this way using L(p) metrics on M(n)(R), with p [epsilon] R(>0).


Assuntos
Modelos Genéticos , Filogenia , Algoritmos , Distribuições Estatísticas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA